SDA 4.1 Documentation for RECODE


NAME

recode - recode variables

USAGE

recode -b filename

DESCRIPTION

RECODE uses one or more existing variables as input to create a new SDA variable.

Ordinarily this program is invoked by the Web interface for the SDA programs, and the user does not have to deal with the keywords given in this document. Output from the program is in HTML, which can be viewed with a Web browser. Users who run this program interactively should see the online help document.

It is also possible to run the program directly by preparing a command file, which specifies the variables to be analyzed and the options to use. This document explains how to prepare such a file. The name of this batch command file is specified to the program after the `-b' option flag.


CONTENTS


BATCH FILE LAYOUT

The batch file is laid out in separate parts, separated by asterisks (*). The parts can be given in any order. Since the "map," category labels, and descriptive text can have varying numbers of lines, each of those parts ends with an asterisk (*) on a line by itself. The general layout is as follows:

     (Input and output definitions)

     MAP=
     (Recode map)
     *

     CATLABELS=       [optional]
     (Category text and labels)
     *

     TEXT=            [optional]
     (Descriptive text)
     *

     **

After a line with two asterisks (**) on a line by itself, another set of RECODE commands can be added to the same batch file. This is how you include multiple recodes in the same file.


KEYWORDS FOR RECODE SPECIFICATIONS

The specifications are given in the form "keyword = something" with one keyword per line. Keywords may be given in any order, either in upper or in lower case. The valid keywords are as follows (with significant characters shown in capital letters):

Defining Input Variables


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


STudies=      path of source dataset(s)       Look for input variables
                                               only in current directory

INvars=       name(s) of input var(s)         REQUIRED

Defining the New Variable


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________


OUTSTudy=     path of study for new variable  Current directory

OUTVar=       name of new variable            REQUIRED (see rules)

LABEL=        long label for new variable     No long label

CATlabels=    (precedes lines of category     No category text
                text - see details below)      or labels

MAP=          (precedes lines with recode     REQUIRED
               map or rules - see below)

MD=           list of invalid codes, ranges   No defined MD codes
              (also used for output value
               if input has missing data
               -- see below)

MIN=          minimum valid code              No defined minimum

MAX=          maximum valid code              No defined maximum

OVERwrite=    yes                             Do not overwrite new var
                                                if it already exists

OTHercases=   name of the input variable      Set to MD code
               from which to take the value    (or system-missing)
               for cases that do not match
               a pattern in the MAP

TEXT=         (precedes lines of descriptive  No item text
                text - see details below)

Other options


Keyword       Possible Specification          Default (if no keyword)
_____________________________________________________________________

DIAGnostics=  yes                             No diagnostic summary of
                                                the new variable

COLorcoding=  yes                             No colored headings in the
                                                diagnostic output

GVARCase=     LOWER or UPPER                  Do not convert all variable
                                                names to lower/upper case

LAnguagefile= Name of file with non-English   English labels on
                labels and messages             output

SAVebatch=    name of directory               No file preserved with batch
                                                commands to create new var
                                                (for interactive version)
                                                The batch file name is the
                                                name of the new variable,
                                                with the suffix '.rec'


ABBREVIATIONS AND REPETITIONS

Most keywords can be abbreviated. Usually only two or three characters are required. The keyword for the category text for the new variable, for instance, can be given as "catlabels=" or "catlab=" or even "cat=". Either upper or lower case may be used. If keywords are repeated, the second specification will override the first.

COMMENTS

Anything on a line beginning with "#" is ignored by the batch processor and can therefore be used for comments. Blank lines are also ignored.

RECODE MAP

The rules for combining the values of one or more input variables into a value on the output variable are contained in the recode map. First put the MAP keyword on a line by itself; then put each recode rule on a separate line. The general format is as follows: New value: values on var 1 [; values on var 2; ... ] The recode rules for different input variables are separated by a semicolon (;). After the last rule, put an asterisk (*) on a line by itself. For example, to recode age and gender into 4 categories (younger male, younger female, older male, older female), one could construct the following recode map:

     map=
     1: 18-49; 1
     2: 18-49; 2
     3: 50-97; 1
     4: 50-97; 2
     *

It is possible to have more than one rule for a given output value -- notice that the output code 4 has three rules in the example given below.

     map=
     1: 1,3-5,7 ; 1-10
     2: 1,3-5,7 ; 11-50
     3: 1,3-5,7 ; 51-90
     4: 8-10,12 ; *
     4: 41,45,55; 11-90
     4: 61-90   ; *
     9:    **   ; **
     *
Each recode rule (line) can include more than one NUMERIC value or range for each input variable, as long as they are separated by commas.

A single asterisk (*) in a recode rule matches any VALID value of the corresponding input variable. Two asterisks (**) match ANY value, including missing-data (both user-defined and system- missing) and out-of-range values. When an asterisk or double asterisk is used as a stand-alone specification, and is NOT used as part of a range, it cannot be combined with other specifications for the same input variable in a recode rule (on the same line).

If a case matches more than one recode rule, the first rule encountered will apply. Notice in this example that the recode rule `**; **' matches all values of the two input variables; any cases not covered by a rule higher up in the map will receive the value 9.

RECODING CHARACTER MD AND SYSTEM MD

RECODE only works with NUMERIC variables. If, however, a numeric variable contains character values that have been defined as missing-data codes (such as `D' or `R' to mean "Don't Know" or "Refused"), RECODE can handle those values and convert them into a numeric code, if desired.

Similarly, the system missing-data code can be recoded into a numeric value by referring to it as '$.' in a recoding rule. (Note the period after the dollar sign.)

These new numeric values will be treated as valid codes, unless they are defined as missing data codes on the new variable. If you want the new values to be defined as missing data codes, use the 'MD=' keyword to define them as missing data.

One of the examples below illustrates how character MD values and the system MD value are recoded into numbers.

CASES UNMATCHED BY THE RECODE MAP

If a case does not match any of the recode rules, the output variable can take on one of several values, depending on the options that were specified.

HANDLING MD IN THE DEFAULT VARIABLE

Cases that do not match the RECODE map can be assigned the values of a default variable (specified with the 'OTHercases=' keyword). If the default variable has values defined as missing data on those cases, those values are passed to the new variable, but their missing data status depends on the type of missing data each value happens to be:

CATEGORY TEXT AND LABELS

Category text and labels for one or more codes of the new variable can be supplied. First put the `CATlabels=' keyword on a line by itself; then specify on a separate line each code, followed by one or more spaces or tabs, then the category text [and short label, if desired]. (Programs such as TABLES and MEANS will use the short label for a category, if one is available.) Put an asterisk (*) on a line by itself after the last label. For example:

     catlabels=
     1 Professional and technical [Prf,Tech]
     2 Managers
     3 Blue collar workers [Blue Col]
     4 Other
     9 Missing
     *

DESCRIPTIVE TEXT

Descriptive text may be stored with the new variable. This text can then be displayed when the variable is used in analysis programs or in a codebook. First put the `TEXT=' keyword on a line by itself; then write as many lines of text as you wish to store with the new variable. Put an asterisk (*) on a line by itself after the last line of text. One of the examples below illustrates this feature.

MULTIPLE RECODES

RECODE commands for two or more variables can be included in the same batch file. After the first set of commands is finished, indicated by a line beginning with two asterisks (**), the commands for another new variable can follow. The value of the `STudies=' keyword is carried over from the previous set of commands, unless it is respecified.

BACKWARD COMPATIBILITY

RECODE can read most older CSA recode commands. The following keywords are still recognized and are equivalent to the new keywords shown in parentheses: The missing-data keywords `md1=value1' and `md2=value2' are also recognized and are equivalent to the new form: `md= value1, value2'.

Note, however, that in the CSA recode rules, a single asterisk (*) matches ALL values of an input variable. SDA distinguishes between a single asterisk, which matches only the VALID values of an input variable; and two asterisks, which match ALL values.


EXAMPLES OF BATCH FILES

1. Collapse age into 3 categories

study = /sda/testdata invar = age outvar = age3 label = Collapsed age - 3 categories md = 9 map= 1: 18-29 2: 30-49 3: 50-97 * catlabels= 1 <30 2 30-49 3 50+ 9 missing * **


2. Recode age and gender into 4 categories

invars = age gender outvar = agesex label = Age-gender typology overwrite = yes md = 9 map= 1: 18-49; 1 2: 18-49; 2 3: 50-97; 1 4: 50-97; 2 * catlabels= 1 Yng Male 2 Yng Feml 3 Old Male 4 Old Feml 9 Missing * text= This variable is a four-category typology of age and gender * **


3. Collapse only highest and lowest values of age

(using the 'othercases=' option.)

study = /sda/testdata invar = age outvar = age2070 label = Collapsed age - 20-70

# Note the use of the `othercases=' option; # only the codes given in the map are changed. # The new variable will carry over the original values # of 'age' for all cases not covered by the map. othercases = age

# Cases that have the value 98 or 99 on 'age' are not covered in # the map and are assumed to be missing data. # This 'MD=' command specifies that cases with 98 or 99 # will be defined as missing data also on the new variable. md = 98,99

map= 20: 1-20 70: 70-97 *

catlabels= 20 20 or younger 70 70 or older *

**


4. Convert character MD and system MD to numbers

invar = spend outvar = numspend label = Recoded spend variable # Define the recoded numeric values as missing data. # (They could be left as valid values, if desired.) md = -1,8,9

map= 1: 1-2 2: 3 8: D 9: R -1: $. *

catlabels= 1 A lot 2 Not enough 8 Don't know 9 Refused -1 No data *

**


CSM, UC Berkeley/ISA
April 25, 2023